Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libmem: implement policy-agnostic memory allocation/accounting. #332

Merged
merged 10 commits into from
Oct 7, 2024

Conversation

klihub
Copy link
Collaborator

@klihub klihub commented Jun 18, 2024

This patch series

  • implements a policy-agnostic library for memory allocation/accounting
  • replaces existing memory accounting in the topology-aware policy with libmem
  • plugs in libmem-based memory accounting and allocation to the balloons policy

For a more detailed description of the main ideas, core concepts, and some high level
implementation details, see the included package documentation.

TODO items:

  • get topology-aware: don't ignore HBM memory nodes without close CPUs. #329 merged then rebase
  • add missing ensureNormalMemory() during initial zone selection
  • allow for both strict and preferred memory type selection
  • add an interface for overriding builtin zone expansion and overflow resolution
  • make sure memory type preference is consistently either a preference or a requirement
  • add a description of how the allocator works/describe the basic idea
  • comment/document zone expansion, zone overflow resolution
  • clean up debug-logging to be actually helpful
  • replace inertia with a more granular priority, with pre-defined priorities for QoS classes

Remaining questions related to this initial implementation:

  • Are the builtin defaults good enough ?

@klihub klihub requested a review from askervin June 18, 2024 07:30
@klihub klihub force-pushed the devel/libmem branch 2 times, most recently from d7744f7 to 6632815 Compare June 19, 2024 12:20
Copy link
Collaborator

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

....starting review. I could not yet follow through the whole path if it is or not possible that PreserveContainer requests can return zones different from originals. But if they do... for instance if a single zone of size 32 GB is the only zone that is allowed for two preserved-memory containers, each requesting 32 GB of RAM, then I'd still assume that whoever did this preservation must know what he is doing. If he prefers oom kill of his containers rather than letting them access memory out of the zone, then so be it, we won't touch pinning.

@klihub klihub force-pushed the devel/libmem branch 15 times, most recently from 8dbbe4f to 9f6da52 Compare July 3, 2024 20:13
@klihub klihub requested a review from askervin July 3, 2024 20:19
@klihub klihub force-pushed the devel/libmem branch 7 times, most recently from 0a54715 to d61ab79 Compare July 4, 2024 08:39
@klihub klihub marked this pull request as ready for review July 4, 2024 08:41
@klihub klihub force-pushed the devel/libmem branch 4 times, most recently from 3ed379c to 515f99a Compare September 11, 2024 20:00
@klihub klihub marked this pull request as draft September 18, 2024 07:41
@klihub klihub marked this pull request as ready for review September 18, 2024 12:41
@klihub klihub force-pushed the devel/libmem branch 2 times, most recently from 1b6292b to b03dd81 Compare October 1, 2024 12:52
klihub added a commit to klihub/nri-plugins that referenced this pull request Oct 1, 2024
This will conflict if/when we merge containers#332/libmem. This will need to
get dropped, rebased on latest main, the checked and fixed again.
So I'll just keep it at the tip until then...

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the devel/libmem branch 4 times, most recently from 7461f05 to 015c985 Compare October 3, 2024 11:46
klihub and others added 10 commits October 3, 2024 16:31
Initial implementation of a policy agnostic memory accounting
and allocation library.

Signed-off-by: Krisztian Litkey <[email protected]>
Add a brief description of libmem, its basic ideas and core
concepts, as package level documentation.

Signed-off-by: Krisztian Litkey <[email protected]>
Cut out the original memory accounting and allocation code.
Plug in a libmem-based memory allocator instead.

Signed-off-by: Krisztian Litkey <[email protected]>
Update unit tests after libmem conversion.

Signed-off-by: Krisztian Litkey <[email protected]>
Plug in libmem-based memory allocation (and accounting).

Signed-off-by: Krisztian Litkey <[email protected]>
Add support for per balloon type memory configuration and per
container overrides using pod annotations. Pass configured or
annotated memory types to libmem for allocation.

TODO(klihub): per balloon configuration still missing (?)

Co-authored-by: Krisztian Litkey <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
Add first e2e tests for topology-aware policy memory allocation
and type control.

Co-authored-by: Krisztian Litkey <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
"MEMTYPE=x create ..." in test scripts creates a guaranteed or
burstable pod with "memory-type...: x" annotation effective for all
containers in the pod.

Signed-off-by: Antti Kervinen <[email protected]>
Adds fuzz test generator script, model and runner for topology-aware
policy reliability tests on HBM+DRAM+PMEM platform.

Signed-off-by: Antti Kervinen <[email protected]>
Copy link
Collaborator

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thank you for this huge work, @klihub!

I think we should take it in and start developing on top of it.

@klihub klihub requested a review from fmuyassarov October 7, 2024 08:20
@klihub
Copy link
Collaborator Author

klihub commented Oct 7, 2024

LGTM.

Thank you for this huge work, @klihub!

I think we should take it in and start developing on top of it.

Yeah, I tend to think so, too. Then we'll need to fix bugs and polish the turd further as we go. I'm just unable to focus enough to get this into any better shape now.

@askervin askervin merged commit 7399adb into containers:main Oct 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants